Skip to content

Enable multiple treated units in synthetic control quasi experiments #494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Jun 28, 2025

Closes #456

Changes across quite a few files, so I'll give my PR summary and include a copilot generated summary in case it's useful.

Ben's PR summary

The main aim of this PR is to enable analysis of synthetic control experiments with multiple treated units. Previously this quasi experimental situation was kind of possible (and outlined in the Multi-cell geolift analysis notebook), though that worked by literally iterating over each treated unit and running multiple independent analyses.

This PR enables one model to be used when there are either one or multiple treated units. The construction of the model WeightedSumFitter still corresponds to an unpooled model, but I am holding off from exploring partial pooling because we have a PR about to be merged which enables user-provided priors (#488) and I'd rather do it using that approach.

My initial implementation lead to complex code because the WeightedSumFitter model (used for synthetic control) was the only one which had a 2D likelihood (dims of ["obs_ind", "treated_units"]). All the other models had a 1D likelihood (dims of ["obs_ind"]). So there was a lot of branching and dealing with special cases. So another large change introduced by this PR is that all likelihood terms are now 2D. This is why there are code changes beyond the WeightedSumFitter and SyntheticControl classes.

We've also got additional (vibe-coded, but manually examined) tests to cover both the single and multi treated unit cases for synthetic control.

I've also re-built the UML diagrams and relevant notebooks for the docs. The main changes are in the Multi-cell geolift analysis notebook, which is updated to reflect the new functionality.

Copilot generated PR summary

This pull request introduces changes across multiple causal inference experiment modules to improve compatibility with multi-dimensional data structures and ensure proper handling of single treated units. Key updates include modifying data array dimensions, adding new coordinates, and refining plotting methods to handle single-unit data.

Changes to Data Array Structures:

  • causalpy/experiments/diff_in_diff.py: Updated self.y and COORDS to include a new dimension treated_units for better handling of multi-dimensional data.
  • causalpy/experiments/interrupted_time_series.py: Modified self.pre_y and self.post_y to retain 2D shapes and added treated_units as a coordinate. Adjusted COORDS for PyMCModel compatibility. [1] [2]
  • causalpy/experiments/prepostnegd.py: Updated self.y and COORDS to include treated_units for improved data structure handling.
  • causalpy/experiments/regression_discontinuity.py and causalpy/experiments/regression_kink.py: Added treated_units dimension and coordinate to self.y and updated COORDS. [1] [2]

Adjustments to Plotting Methods:

Compatibility with Single-Unit Data:


📚 Documentation preview 📚: https://causalpy--494.org.readthedocs.build/en/494/

Copy link

codecov bot commented Jun 28, 2025

Codecov Report

Attention: Patch coverage is 98.50374% with 6 lines in your changes missing coverage. Please review.

Project coverage is 95.13%. Comparing base (fdce5b0) to head (192327d).

Files with missing lines Patch % Lines
causalpy/tests/test_pymc_models.py 98.20% 3 Missing ⚠️
causalpy/pymc_models.py 95.74% 2 Missing ⚠️
causalpy/experiments/synthetic_control.py 97.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #494      +/-   ##
==========================================
+ Coverage   94.59%   95.13%   +0.54%     
==========================================
  Files          28       28              
  Lines        2053     2384     +331     
==========================================
+ Hits         1942     2268     +326     
- Misses        111      116       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@drbenvincent
Copy link
Collaborator Author

Note to self

Relatively happy with where this is at now.

I should do more manual inspection of the tests (which were vibe coded).

There is definitely scope to remove some conditional branching if we set the likelihood of all models to be 1 dimensions. So it would change from (n_obs,) and turn into (n_obs,1). But I think I'll leave that for another PR potentially.

@drbenvincent drbenvincent requested a review from Copilot June 29, 2025 13:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables handling multiple treated units throughout the synthetic control workflow by updating documentation, extending the PyMC models, and adding comprehensive multi-unit tests.

  • Added sphinx-togglebutton for interactive docs
  • Extended PyMCModel and WeightedSumFitter to accept and process multiple treated units
  • Updated SyntheticControl class and added end-to-end tests for multi-unit scenarios

Reviewed Changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Added sphinx-togglebutton to docs dependencies
docs/source/conf.py Registered sphinx_togglebutton extension
causalpy/tests/test_pymc_models.py Added fixtures and tests for multi-unit WeightedSumFitter
causalpy/tests/test_integration_pymc_examples.py Added fixtures and integration tests for multi-unit SyntheticControl
causalpy/pymc_models.py Updated _data_setter, predict, score, and coefficient printing to support multi-unit
causalpy/experiments/synthetic_control.py Renamed dims (control_unitscoeffs), parameterized plots and data getters for treated units

The likelihood of all models is now 2 dimensional. This means we don't have to do conditional branching for single vs multiple treatment units. So we've been able to remove a lot of the code in PyMCModel. This has touched a number of experiment classes which are not related to synthetic control.
@drbenvincent
Copy link
Collaborator Author

That last commit was quite a big one. Model likelihoods are now 2-dimensional. This means we can avoid a lot of conditional branching based on single vs multiple treated unit situations.

@drbenvincent drbenvincent added enhancement New feature or request geo project Related to geo-testing labels Jul 5, 2025
@drbenvincent drbenvincent marked this pull request as ready for review July 5, 2025 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request geo project Related to geo-testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade synthetic control to model multiple treated units
1 participant